Regex Extension: Grouping

Part of the Regex Documentation website.

Grouping

Parentheses in regular expressions serve two distinct functions:
1) limit the scope of subexpressions when using alternation;
2) identification of subexpressions (also known as groups) as a single unit.

Limiting the scope of subexpressions using alternation is discussed elsewhere.
Identifying a subexpressions allows:
1) the application of quantifiers to more than a single character in the regular expression;

Example
s = "Family cars are built for 4 people, but the average family has 4.2 people in it";
regex.extract ("[0-9]+(\\.[0-9]*)?", @s, @temp.regexList)
   » 2
By making the decimal point and any numbers after the decimal point optional, 2 matches were found.

2) capturing text matching the subexpression(s) in addition to capturing text matching the whole regular expression.
Those verbs that use the MatchInfo Table will be able use the text returned in the groupStrings cell.
Captured subexpressions are returned in the order of the opening paretheses.

Example
s = "Subject: What's today's lesson\r";
regex.easySearch ("(Subject:) +([^\r]*)\r", s, @temp.matchInfo)
   » true
In the example, the matched subexpressions are in the list at temp.matchInfo.groupStrings: {"Subject:", "What's today's lesson"}

Back references
Regex keeps track of the matched subexpressions via an index. This allows you to refer to the matched sub-expressions within the regular expression using the back-reference metacharacter, "\digit", where digit is an integer from 1 to 9.
The backslash, "\" must be escaped to enable Frontier to pass it to the regex engine.
Example - finding doubled words
s = "The big black black dog";
regex.easySearch ("\\<(\\w+)\\s+\\1", s, @temp.matchInfo)
   » true
The pattern "\\<(\\w+)" will identify a word (using three character class shortcuts), with the letters of the word captured in the subexpression "(\\w+)". This subexpression is then referred to by "\\1" - after allowing for any whitespace.

In the case of regex.subst, backreferences can be used in the replacement text.
Example - replace a doubled word with only once
s = "The big black black dog";
regex.subst ("\\<(\\w+)\\s+\\1", "\\1", @s)
   » true

This page was last updated at Mon, 09 Nov 1998 19:54:19 GMT.
Please send all questions and comments to regex@lists.scriptmeridian.org.
Check our website for updates to the docs.